Search | VHL Regional Portal

1.

AZGP1 deficiency promotes angiogenesis in prostate cancer.

Wen, Ru M; Qiu, Zhengyuan; Marti, G Edward W; Peterson, Eric E; Marques, Fernando Jose Garcia; Bermudez, Abel; Wei, Yi; Nolley, Rosalie; Lam, Nathan; Polasko, Alex LaPat; Chiu, Chun-Lung; Zhang, Dalin; Cho, Sanghee; Karageorgos, Grigorios Marios; McDonough, Elizabeth; Chadwick, Chrystal; Ginty, Fiona; Jung, Kyeong Joo; Machiraju, Raghu; Mallick, Parag; Crowley, Laura; Pollack, Jonathan R; Zhao, Hongjuan; Pitteri, Sharon J; Brooks, James D.

J Transl Med ; 22(1): 383, 2024 Apr 24.

Article in English | MEDLINE | ID: mdl-38659028

ABSTRACT

BACKGROUND: Loss of AZGP1 expression is a biomarker associated with progression to castration resistance, development of metastasis, and poor disease-specific survival in prostate cancer. However, high expression of AZGP1 cells in prostate cancer has been reported to increase proliferation and invasion. The exact role of AZGP1 in prostate cancer progression remains elusive. METHOD: AZGP1 knockout and overexpressing prostate cancer cells were generated using a lentiviral system. The effects of AZGP1 under- or over-expression in prostate cancer cells were evaluated by in vitro cell proliferation, migration, and invasion assays. Heterozygous AZGP1± mice were obtained from European Mouse Mutant Archive (EMMA), and prostate tissues from homozygous knockout male mice were collected at 2, 6 and 10 months for histological analysis. In vivo xenografts generated from AZGP1 under- or over-expressing prostate cancer cells were used to determine the role of AZGP1 in prostate cancer tumor growth, and subsequent proteomics analysis was conducted to elucidate the mechanisms of AZGP1 action in prostate cancer progression. AZGP1 expression and microvessel density were measured in human prostate cancer samples on a tissue microarray of 215 independent patient samples. RESULT: Neither the knockout nor overexpression of AZGP1 exhibited significant effects on prostate cancer cell proliferation, clonal growth, migration, or invasion in vitro. The prostates of AZGP1-/- mice initially appeared to have grossly normal morphology; however, we observed fibrosis in the periglandular stroma and higher blood vessel density in the mouse prostate by 6 months. In PC3 and DU145 mouse xenografts, over-expression of AZGP1 did not affect tumor growth. Instead, these tumors displayed decreased microvessel density compared to xenografts derived from PC3 and DU145 control cells, suggesting that AZGP1 functions to inhibit angiogenesis in prostate cancer. Proteomics profiling further indicated that, compared to control xenografts, AZGP1 overexpressing PC3 xenografts are enriched with angiogenesis pathway proteins, including YWHAZ, EPHA2, SERPINE1, and PDCD6, MMP9, GPX1, HSPB1, COL18A1, RNH1, and ANXA1. In vitro functional studies show that AZGP1 inhibits human umbilical vein endothelial cell proliferation, migration, tubular formation and branching. Additionally, tumor microarray analysis shows that AZGP1 expression is negatively correlated with blood vessel density in human prostate cancer tissues. CONCLUSION: AZGP1 is a negative regulator of angiogenesis, such that loss of AZGP1 promotes angiogenesis in prostate cancer. AZGP1 likely exerts heterotypical effects on cells in the tumor microenvironment, such as stromal and endothelial cells. This study sheds light on the anti-angiogenic characteristics of AZGP1 in the prostate and provides a rationale to target AZGP1 to inhibit prostate cancer progression.

Subject(s)

Cell Movement , Cell Proliferation , Neovascularization, Pathologic , Prostatic Neoplasms , Male , Animals , Prostatic Neoplasms/pathology , Prostatic Neoplasms/genetics , Prostatic Neoplasms/metabolism , Humans , Neovascularization, Pathologic/genetics , Neovascularization, Pathologic/pathology , Cell Line, Tumor , Mice, Knockout , Glycoproteins/metabolism , Neoplasm Invasiveness , Mice , Gene Expression Regulation, Neoplastic , Angiogenesis , Zn-Alpha-2-Glycoprotein

2.

IntLIM 2.0: identifying multi-omic relationships dependent on discrete or continuous phenotypic measurements.

Eicher, Tara; Spencer, Kyle D; Siddiqui, Jalal K; Machiraju, Raghu; Mathé, Ewy A.

Bioinform Adv ; 3(1): vbad009, 2023.

Article in English | MEDLINE | ID: mdl-36922980

ABSTRACT

Motivation: IntLIM uncovers phenotype-dependent linear associations between two types of analytes (e.g. genes and metabolites) in a multi-omic dataset, which may reflect chemically or biologically relevant relationships. Results: The new IntLIM R package includes newly added support for generalized data types, covariate correction, continuous phenotypic measurements, model validation and unit testing. IntLIM analysis uncovered biologically relevant gene-metabolite associations in two separate datasets, and the run time is improved over baseline R functions by multiple orders of magnitude. Availability and implementation: IntLIM is available as an R package with a detailed vignette (https://github.com/ncats/IntLIM) and as an R Shiny app (see Supplementary Figs S1-S6) (https://intlim.ncats.io/). Supplementary information: Supplementary data are available at Bioinformatics Advances online.

3.

Deep learning-based automated pipeline for blood vessel detection and distribution analysis in multiplexed prostate cancer images.

Karageorgos, Grigorios M; Cho, Sanghee; McDonough, Elizabeth; Chadwick, Chrystal; Ghose, Soumya; Owens, Jonathan; Jung, Kyeong Joo; Machiraju, Raghu; West, Robert; Brooks, James D; Mallick, Parag; Ginty, Fiona.

Front Bioinform ; 3: 1296667, 2023.

Article in English | MEDLINE | ID: mdl-38323039

ABSTRACT

Introduction: Prostate cancer is a highly heterogeneous disease, presenting varying levels of aggressiveness and response to treatment. Angiogenesis is one of the hallmarks of cancer, providing oxygen and nutrient supply to tumors. Micro vessel density has previously been correlated with higher Gleason score and poor prognosis. Manual segmentation of blood vessels (BVs) In microscopy images is challenging, time consuming and may be prone to inter-rater variabilities. In this study, an automated pipeline is presented for BV detection and distribution analysis in multiplexed prostate cancer images. Methods: A deep learning model was trained to segment BVs by combining CD31, CD34 and collagen IV images. In addition, the trained model was used to analyze the size and distribution patterns of BVs in relation to disease progression in a cohort of prostate cancer patients (N = 215). Results: The model was capable of accurately detecting and segmenting BVs, as compared to ground truth annotations provided by two reviewers. The precision (P), recall (R) and dice similarity coefficient (DSC) were equal to 0.93 (SD 0.04), 0.97 (SD 0.02) and 0.71 (SD 0.07) with respect to reviewer 1, and 0.95 (SD 0.05), 0.94 (SD 0.07) and 0.70 (SD 0.08) with respect to reviewer 2, respectively. BV count was significantly associated with 5-year recurrence (adjusted p = 0.0042), while both count and area of blood vessel were significantly associated with Gleason grade (adjusted p = 0.032 and 0.003 respectively). Discussion: The proposed methodology is anticipated to streamline and standardize BV analysis, offering additional insights into the biology of prostate cancer, with broad applicability to other cancers.

4.

Improving Compound Activity Classification via Deep Transfer and Representation Learning.

Dey, Vishal; Machiraju, Raghu; Ning, Xia.

ACS Omega ; 7(11): 9465-9483, 2022 Mar 22.

Article in English | MEDLINE | ID: mdl-35350358

ABSTRACT

Recent advances in molecular machine learning, especially deep neural networks such as graph neural networks (GNNs), for predicting structure-activity relationships (SAR) have shown tremendous potential in computer-aided drug discovery. However, the applicability of such deep neural networks is limited by the requirement of large amounts of training data. In order to cope with limited training data for a target task, transfer learning for SAR modeling has been recently adopted to leverage information from data of related tasks. In this work, in contrast to the popular parameter-based transfer learning such as pretraining, we develop novel deep transfer learning methods TAc and TAc-fc to leverage source domain data and transfer useful information to the target domain. TAc learns to generate effective molecular features that can generalize well from one domain to another and increase the classification performance in the target domain. Additionally, TAc-fc extends TAc by incorporating novel components to selectively learn feature-wise and compound-wise transferability. We used the bioassay screening data from PubChem and identified 120 pairs of bioassays such that the active compounds in each pair are more similar to each other compared to their inactive compounds. Overall, TAc achieves the best performance with an average ROC-AUC of 0.801; it significantly improves the ROC-AUC of 83% of target tasks with an average task-wise performance improvement of 7.102%, compared to the best baseline dmpna. Our experiments clearly demonstrate that TAc achieves significant improvement over all baselines across a large number of target tasks. Furthermore, although TAc-fc achieves slightly worse ROC-AUC on average compared to TAc (0.798 vs 0.801), TAc-fc still achieves the best performance on more tasks in terms of PR-AUC and F1 compared to other methods. In summary, TAc-fc is also found to be a strong model with competitive or even better performance than TAc on a notable number of target tasks.

5.

Autonomous Computing Materials.

Bathe, Mark; Hernandez, Rigoberto; Komiyama, Takaki; Machiraju, Raghu; Neogi, Sanghamitra.

ACS Nano ; 15(3): 3586-3592, 2021 03 23.

Article in English | MEDLINE | ID: mdl-33636971

ABSTRACT

Conventional materials are reaching their limits in computation, sensing, and data storage capabilities, ushered in by the end of Moore's law, myriad sensing applications, and the continuing exponential rise in worldwide data storage demand. Conventional materials are also limited by the controlled environments in which they must operate, their high energy consumption, and their limited capacity to perform simultaneous, integrated sensing, computation, and data storage and retrieval. In contrast, the human brain is capable of multimodal sensing, complex computation, and both short- and long-term data storage simultaneously, with near instantaneous rate of recall, seamless integration, and minimal energy consumption. Motivated by the brain and the need for revolutionary new computing materials, we recently proposed the data-driven materials discovery framework, autonomous computing materials. This framework aims to mimic the brain's capabilities for integrated sensing, computation, and data storage by programming excitonic, phononic, photonic, and dynamic structural nanoscale materials, without attempting to mimic the unknown implementational details of the brain. If realized, such materials would offer transformative opportunities for distributed, multimodal sensing, computation, and data storage in an integrated manner in biological and other nonconventional environments, including interfacing with biological sensors and computers such as the brain itself.

6.

Self-organizing maps with variable neighborhoods facilitate learning of chromatin accessibility signal shapes associated with regulatory elements.

Eicher, Tara; Chan, Jany; Luu, Han; Machiraju, Raghu; Mathé, Ewy A.

BMC Bioinformatics ; 22(1): 35, 2021 Jan 30.

Article in English | MEDLINE | ID: mdl-33516170

ABSTRACT

BACKGROUND: Assigning chromatin states genome-wide (e.g. promoters, enhancers, etc.) is commonly performed to improve functional interpretation of these states. However, computational methods to assign chromatin state suffer from the following drawbacks: they typically require data from multiple assays, which may not be practically feasible to obtain, and they depend on peak calling algorithms, which require careful parameterization and often exclude the majority of the genome. To address these drawbacks, we propose a novel learning technique built upon the Self-Organizing Map (SOM), Self-Organizing Map with Variable Neighborhoods (SOM-VN), to learn a set of representative shapes from a single, genome-wide, chromatin accessibility dataset to associate with a chromatin state assignment in which a particular RE is prevalent. These shapes can then be used to assign chromatin state using our workflow. RESULTS: We validate the performance of the SOM-VN workflow on 14 different samples of varying quality, namely one assay each of A549 and GM12878 cell lines and two each of H1 and HeLa cell lines, primary B-cells, and brain, heart, and stomach tissue. We show that SOM-VN learns shapes that are (1) non-random, (2) associated with known chromatin states, (3) generalizable across sets of chromosomes, and (4) associated with magnitude and multimodality. We compare the accuracy of SOM-VN chromatin states against the Clustering Aggregation Tool (CAGT), an unsupervised method that learns chromatin accessibility signal shapes but does not associate these shapes with REs, and we show that overall precision and recall is increased when learning shapes using SOM-VN as compared to CAGT. We further compare enhancer state assignments from SOM-VN in signals above a set threshold to enhancer state assignments from Predicting Enhancers from ATAC-seq Data (PEAS), a deep learning method that assigns enhancer chromatin states to peaks. We show that the precision-recall area under the curve for the assignment of enhancer states is comparable to PEAS. CONCLUSIONS: Our work shows that the SOM-VN workflow can learn relationships between REs and chromatin accessibility signal shape, which is an important step toward the goal of assigning and comparing enhancer state across multiple experiments and phenotypic states.

Subject(s)

Chromatin , Enhancer Elements, Genetic , Promoter Regions, Genetic , Adult , Algorithms , Child, Preschool , Chromatin/genetics , HeLa Cells , Humans , Young Adult

7.

Spatial cell type composition in normal and Alzheimers human brains is revealed using integrated mouse and human single cell RNA sequencing.

Johnson, Travis S; Xiang, Shunian; Helm, Bryan R; Abrams, Zachary B; Neidecker, Peter; Machiraju, Raghu; Zhang, Yan; Huang, Kun; Zhang, Jie.

Sci Rep ; 10(1): 18014, 2020 10 22.

Article in English | MEDLINE | ID: mdl-33093481

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) resolves heterogenous cell populations in tissues and helps to reveal single-cell level function and dynamics. In neuroscience, the rarity of brain tissue is the bottleneck for such study. Evidence shows that, mouse and human share similar cell type gene markers. We hypothesized that the scRNA-seq data of mouse brain tissue can be used to complete human data to infer cell type composition in human samples. Here, we supplement cell type information of human scRNA-seq data, with mouse. The resulted data were used to infer the spatial cellular composition of 3702 human brain samples from Allen Human Brain Atlas. We then mapped the cell types back to corresponding brain regions. Most cell types were localized to the correct regions. We also compare the mapping results to those derived from neuronal nuclei locations. They were consistent after accounting for changes in neural connectivity between regions. Furthermore, we applied this approach on Alzheimer's brain data and successfully captured cell pattern changes in AD brains. We believe this integrative approach can solve the sample rarity issue in the neuroscience.

Subject(s)

Alzheimer Disease/pathology , Brain/metabolism , Gene Expression Regulation , Microglia/pathology , Neurons/pathology , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Alzheimer Disease/classification , Alzheimer Disease/genetics , Animals , Case-Control Studies , Humans , Mice , Microglia/metabolism , Neurons/metabolism

8.

Metabolomics and Multi-Omics Integration: A Survey of Computational Methods and Resources.

Eicher, Tara; Kinnebrew, Garrett; Patt, Andrew; Spencer, Kyle; Ying, Kevin; Ma, Qin; Machiraju, Raghu; Mathé, And Ewy A.

Metabolites ; 10(5)2020 May 15.

Article in English | MEDLINE | ID: mdl-32429287

ABSTRACT

As researchers are increasingly able to collect data on a large scale from multiple clinical and omics modalities, multi-omics integration is becoming a critical component of metabolomics research. This introduces a need for increased understanding by the metabolomics researcher of computational and statistical analysis methods relevant to multi-omics studies. In this review, we discuss common types of analyses performed in multi-omics studies and the computational and statistical methods that can be used for each type of analysis. We pinpoint the caveats and considerations for analysis methods, including required parameters, sample size and data distribution requirements, sources of a priori knowledge, and techniques for the evaluation of model accuracy. Finally, for the types of analyses discussed, we provide examples of the applications of corresponding methods to clinical and basic research. We intend that our review may be used as a guide for metabolomics researchers to choose effective techniques for multi-omics analyses relevant to their field of study.

9.

PTR Explorer: An approach to identify and explore Post Transcriptional Regulatory mechanisms using proteogenomics.

Srivastava, Arunima; Sharpnack, Michael; Huang, Kun; Mallick, Parag; Machiraju, Raghu.

Pac Symp Biocomput ; 25: 475-486, 2020.

Article in English | MEDLINE | ID: mdl-31797620

ABSTRACT

Integration of transcriptomic and proteomic data should reveal multi-layered regulatory processes governing cancer cell behaviors. Traditional correlation-based analyses have demonstrated limited ability to identify the post-transcriptional regulatory (PTR) processes that drive the non-linear relationship between transcript and protein abundances. In this work, we ideate an integrative approach to explore the variety of post-transcriptional mechanisms that dictate relationships between genes and corresponding proteins. The proposed workflow utilizes the intuitive technique of scatterplot diagnostics or scagnostics, to characterize and examine the diverse scatterplots built from transcript and protein abundances in a proteogenomic experiment. The workflow includes representing gene-protein relationships as scatterplots, clustering on geometric scagnostic features of these scatterplots, and finally identifying and grouping the potential gene-protein relationships according to their disposition to various PTR mechanisms. Our study verifies the efficacy of the implemented approach to excavate possible regulatory mechanisms by utilizing comprehensive tests on a synthetic dataset. We also propose a variety of 2D pattern-specific downstream analyses methodologies such as mixture modeling, and mapping miRNA post-transcriptional effects to explore each mechanism further. This work suggests that the proposed methodology has the potential for discovering and categorizing post-transcriptional regulatory mechanisms, manifesting in proteogenomic trends. These trends subsequently provide evidence for cancer specificity, miRNA targeting, and identification of regulation impacted by biological functionality and different types of degradation. (Supplementary Material - https://github.com/arunima2/PTRE_PSB_2020).

Subject(s)

MicroRNAs , Proteogenomics , Computational Biology , Gene Expression Regulation , Proteomics

10.

Challenges in proteogenomics: a comparison of analysis methods with the case study of the DREAM proteogenomics sub-challenge.

Eicher, Tara; Patt, Andrew; Kautto, Esko; Machiraju, Raghu; Mathé, Ewy; Zhang, Yan.

BMC Bioinformatics ; 20(Suppl 24): 669, 2019 Dec 20.

Article in English | MEDLINE | ID: mdl-31861998

ABSTRACT

BACKGROUND: Proteomic measurements, which closely reflect phenotypes, provide insights into gene expression regulations and mechanisms underlying altered phenotypes. Further, integration of data on proteome and transcriptome levels can validate gene signatures associated with a phenotype. However, proteomic data is not as abundant as genomic data, and it is thus beneficial to use genomic features to predict protein abundances when matching proteomic samples or measurements within samples are lacking. RESULTS: We evaluate and compare four data-driven models for prediction of proteomic data from mRNA measured in breast and ovarian cancers using the 2017 DREAM Proteogenomics Challenge data. Our results show that Bayesian network, random forests, LASSO, and fuzzy logic approaches can predict protein abundance levels with median ground truth-predicted correlation values between 0.2 and 0.5. However, the most accurately predicted proteins differ considerably between approaches. CONCLUSIONS: In addition to benchmarking aforementioned machine learning approaches for predicting protein levels from transcript levels, we discuss challenges and potential solutions in state-of-the-art proteogenomic analyses.

Subject(s)

Proteogenomics , Bayes Theorem , Gene Expression Regulation , Humans , Proteome/analysis , RNA, Messenger/genetics , Transcriptome

11.

Visual Exploration of Neural Document Embedding in Information Retrieval: Semantics and Feature Selection.

Ji, Xiaonan; Shen, Han-Wei; Ritter, Alan; Machiraju, Raghu; Yen, Po-Yin.

IEEE Trans Vis Comput Graph ; 25(6): 2181-2192, 2019 06.

Article in English | MEDLINE | ID: mdl-30892213

ABSTRACT

Neural embeddings are widely used in language modeling and feature generation with superior computational power. Particularly, neural document embedding - converting texts of variable-length to semantic vector representations - has shown to benefit widespread downstream applications, e.g., information retrieval (IR). However, the black-box nature makes it difficult to understand how the semantics are encoded and employed. We propose visual exploration of neural document embedding to gain insights into the underlying embedding space, and promote the utilization in prevalent IR applications. In this study, we take an IR application-driven view, which is further motivated by biomedical IR in healthcare decision-making, and collaborate with domain experts to design and develop a visual analytics system. This system visualizes neural document embeddings as a configurable document map and enables guidance and reasoning; facilitates to explore the neural embedding space and identify salient neural dimensions (semantic features) per task and domain interest; and supports advisable feature selection (semantic analysis) along with instant visual feedback to promote IR performance. We demonstrate the usefulness and effectiveness of this system and present inspiring findings in use cases. This work will help designers/developers of downstream applications gain insights and confidence in neural document embedding, and exploit that to achieve more favorable performance in application domains.

Subject(s)

Information Storage and Retrieval/methods , Machine Learning , Natural Language Processing , Semantics , Cluster Analysis , Humans

12.

Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility.

Srivastava, Arunima; Adusumilli, Ravali; Boyce, Hunter; Garijo, Daniel; Ratnakar, Varun; Mayani, Rajiv; Yu, Thomas; Machiraju, Raghu; Gil, Yolanda; Mallick, Parag.

Pac Symp Biocomput ; 24: 208-219, 2019.

Article in English | MEDLINE | ID: mdl-30864323

ABSTRACT

Benchmark challenges, such as the Critical Assessment of Structure Prediction (CASP) and Dialogue for Reverse Engineering Assessments and Methods (DREAM) have been instrumental in driving the development of bioinformatics methods. Typically, challenges are posted, and then competitors perform a prediction based upon blinded test data. Challengers then submit their answers to a central server where they are scored. Recent efforts to automate these challenges have been enabled by systems in which challengers submit Docker containers, a unit of software that packages up code and all of its dependencies, to be run on the cloud. Despite their incredible value for providing an unbiased test-bed for the bioinformatics community, there remain opportunities to further enhance the potential impact of benchmark challenges. Specifically, current approaches only evaluate end-to-end performance; it is nearly impossible to directly compare methodologies or parameters. Furthermore, the scientific community cannot easily reuse challengers' approaches, due to lack of specifics, ambiguity in tools and parameters as well as problems in sharing and maintenance. Lastly, the intuition behind why particular steps are used is not captured, as the proposed workflows are not explicitly defined, making it cumbersome to understand the flow and utilization of data. Here we introduce an approach to overcome these limitations based upon the WINGS semantic workflow system. Specifically, WINGS enables researchers to submit complete semantic workflows as challenge submissions. By submitting entries as workflows, it then becomes possible to compare not just the results and performance of a challenger, but also the methodology employed. This is particularly important when dozens of challenge entries may use nearly identical tools, but with only subtle changes in parameters (and radical differences in results). WINGS uses a component driven workflow design and offers intelligent parameter and data selection by reasoning about data characteristics. This proves to be especially critical in bioinformatics workflows where using default or incorrect parameter values is prone to drastically altering results. Different challenge entries may be readily compared through the use of abstract workflows, which also facilitate reuse. WINGS is housed on a cloud based setup, which stores data, dependencies and workflows for easy sharing and utility. It also has the ability to scale workflow executions using distributed computing through the Pegasus workflow execution system. We demonstrate the application of this architecture to the DREAM proteogenomic challenge.

Subject(s)

Benchmarking/methods , Semantics , Workflow , Algorithms , Computational Biology/methods , Gene Expression Profiling/statistics & numerical data , Genomics , Metadata , Proteins/genetics , Proteins/metabolism , Reproducibility of Results , Sequence Analysis, RNA/statistics & numerical data

13.

Integrative cancer patient stratification via subspace merging.

Ding, Hao; Sharpnack, Michael; Wang, Chao; Huang, Kun; Machiraju, Raghu.

Bioinformatics ; 35(10): 1653-1659, 2019 05 15.

Article in English | MEDLINE | ID: mdl-30329022

ABSTRACT

MOTIVATION: Technologies that generate high-throughput omics data are flourishing, creating enormous, publicly available repositories of multi-omics data. As many data repositories continue to grow, there is an urgent need for computational methods that can leverage these data to create comprehensive clusters of patients with a given disease. RESULTS: Our proposed approach creates a patient-to-patient similarity graph for each data type as an intermediate representation of each omics data type and merges the graphs through subspace analysis on a Grassmann manifold. We hypothesize that this approach generates more informative clusters by preserving the complementary information from each level of omics data. We applied our approach to The Cancer Genome Atlas (TCGA) breast cancer dataset and show that by integrating gene expression, microRNA and DNA methylation data, our proposed method can produce clinically useful subtypes of breast cancer. We then investigate the molecular characteristics underlying these subtypes. We discover a highly expressed cluster of genes on chromosome 19p13 that strongly correlates with survival in TCGA breast cancer patients and validate these results in three additional breast cancer datasets. We also compare our approach with previous integrative clustering approaches and obtain comparable or superior results. AVAILABILITY AND IMPLEMENTATION: https://github.com/michaelsharpnack/GrassmannCluster. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Algorithms , Breast Neoplasms , Cluster Analysis , DNA Methylation , Genome , Humans

14.

Imitating Pathologist Based Assessment With Interpretable and Context Based Neural Network Modeling of Histology Images.

Srivastava, Arunima; Kulkarni, Chaitanya; Huang, Kun; Parwani, Anil; Mallick, Parag; Machiraju, Raghu.

Biomed Inform Insights ; 10: 1178222618807481, 2018.

Article in English | MEDLINE | ID: mdl-30450002

ABSTRACT

Convolutional neural networks (CNNs) have gained steady popularity as a tool to perform automatic classification of whole slide histology images. While CNNs have proven to be powerful classifiers in this context, they fail to explain this classification, as the network engineered features used for modeling and classification are ONLY interpretable by the CNNs themselves. This work aims at enhancing a traditional neural network model to perform histology image modeling, patient classification, and interpretation of the distinctive features identified by the network within the histology whole slide images (WSIs). We synthesize a workflow which (a) intelligently samples the training data by automatically selecting only image areas that display visible disease-relevant tissue state and (b) isolates regions most pertinent to the trained CNN prediction and translates them to observable and qualitative features such as color, intensity, cell and tissue morphology and texture. We use the Cancer Genome Atlas's Breast Invasive Carcinoma (TCGA-BRCA) histology dataset to build a model predicting patient attributes (disease stage and node status) and the tumor proliferation challenge (TUPAC 2016) breast cancer histology image repository to help identify disease-relevant tissue state (mitotic activity). We find that our enhanced CNN based workflow both increased patient attribute predictive accuracy (~2% increase for disease stage and ~10% increase for node status) and experimentally proved that a data-driven CNN histology model predicting breast invasive carcinoma stages is highly sensitive to features such as color, cell size, and shape, granularity, and uniformity. This work summarizes the need for understanding the widely trusted models built using deep learning and adds a layer of biological context to a technique that functioned as a classification only approach till now.

15.

Proteogenomic Analysis of Surgically Resected Lung Adenocarcinoma.

Sharpnack, Michael F; Ranbaduge, Nilini; Srivastava, Arunima; Cerciello, Ferdinando; Codreanu, Simona G; Liebler, Daniel C; Mascaux, Celine; Miles, Wayne O; Morris, Robert; McDermott, Jason E; Sharpnack, James L; Amann, Joseph; Maher, Christopher A; Machiraju, Raghu; Wysocki, Vicki H; Govindan, Ramaswami; Mallick, Parag; Coombes, Kevin R; Huang, Kun; Carbone, David P.

J Thorac Oncol ; 13(10): 1519-1529, 2018 10.

Article in English | MEDLINE | ID: mdl-30017829

ABSTRACT

INTRODUCTION: Despite apparently complete surgical resection, approximately half of resected early-stage lung cancer patients relapse and die of their disease. Adjuvant chemotherapy reduces this risk by only 5% to 8%. Thus, there is a need for better identifying who benefits from adjuvant therapy, the drivers of relapse, and novel targets in this setting. METHODS: RNA sequencing and liquid chromatography/liquid chromatography-mass spectrometry proteomics data were generated from 51 surgically resected non-small cell lung tumors with known recurrence status. RESULTS: We present a rationale and framework for the incorporation of high-content RNA and protein measurements into integrative biomarkers and show the potential of this approach for predicting risk of recurrence in a group of lung adenocarcinomas. In addition, we characterize the relationship between mRNA and protein measurements in lung adenocarcinoma and show that it is outcome specific. CONCLUSIONS: Our results suggest that mRNA and protein data possess independent biological and clinical importance, which can be leveraged to create higher-powered expression biomarkers.

Subject(s)

Adenocarcinoma of Lung/surgery , Lung Neoplasms/surgery , Proteogenomics/methods , Adenocarcinoma of Lung/pathology , Female , Humans , Lung Neoplasms/pathology , Male

16.

annoPeak: a web application to annotate and visualize peaks from ChIP-seq/ChIP-exo-seq.

Tang, Xing; Srivastava, Arunima; Liu, Huayang; Machiraju, Raghu; Huang, Kun; Leone, Gustavo.

Bioinformatics ; 34(16): 2879, 2018 08 15.

Article in English | MEDLINE | ID: mdl-29672705

17.

Building trans-omics evidence: using imaging and 'omics' to characterize cancer profiles.

Srivastava, Arunima; Kulkarni, Chaitanya; Mallick, Parag; Huang, Kun; Machiraju, Raghu.

Pac Symp Biocomput ; 23: 377-387, 2018.

Article in English | MEDLINE | ID: mdl-29218898

ABSTRACT

Utilization of single modality data to build predictive models in cancer results in a rather narrow view of most patient profiles. Some clinical facet s relate strongly to histology image features, e.g. tumor stages, whereas others are associated with genomic and proteomic variations (e.g. cancer subtypes and disease aggression biomarkers). We hypothesize that there are coherent "trans-omics" features that characterize varied clinical cohorts across multiple sources of data leading to more descriptive and robust disease characterization. In this work, for l 05 breast cancer patients from the TCGA (The Cancer Genome Atlas), we consider four clinical attributes (AJCC Stage, Tumor Stage, ER-Status and PAM50 mRNA Subtypes), and build predictive models using three different modalities of data (histopathological images, transcriptomics and proteomics). Following which, we identify critical multi-level features that drive successful classification of patients for the various different cohorts. To build predictors for each data type, we employ widely used "best practice" techniques including CNN-based (convolutional neural network) classifiers for histopathological images and regression models for proteogenomic data. While, as expected, histology images outperformed molecular features while predicting cancer stages, and transcriptomics held superior discriminatory power for ER-Status and PAM50 subtypes, there exist a few cases where all data modalities exhibited comparable performance. Further, we also identified sets of key genes and proteins whose expression and abundance correlate across each clinical cohort including (i) tumor severity and progression (incl. GABARAP), (ii) ER-status (incl.ESRl) and (iii) disease subtypes (incl. FOXCl). Thus, we quantitatively assess the efficacy of different data types to predict critical breast cancer patient attributes and improve disease characterization.

Subject(s)

Breast Neoplasms/diagnostic imaging , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Computational Biology/methods , Female , Gene Expression Profiling/statistics & numerical data , Genomics/statistics & numerical data , Humans , Neural Networks, Computer , Proteomics/statistics & numerical data , RNA, Messenger/genetics , Receptors, Estrogen/metabolism , Regression Analysis

18.

Predictive models for pressure ulcers from intensive care unit electronic health records using Bayesian networks.

Kaewprag, Pacharmon; Newton, Cheryl; Vermillion, Brenda; Hyun, Sookyung; Huang, Kun; Machiraju, Raghu.

BMC Med Inform Decis Mak ; 17(Suppl 2): 65, 2017 Jul 05.

Article in English | MEDLINE | ID: mdl-28699545

ABSTRACT

BACKGROUND: We develop predictive models enabling clinicians to better understand and explore patient clinical data along with risk factors for pressure ulcers in intensive care unit patients from electronic health record data. Identifying accurate risk factors of pressure ulcers is essential to determining appropriate prevention strategies; in this work we examine medication, diagnosis, and traditional Braden pressure ulcer assessment scale measurements as patient features. In order to predict pressure ulcer incidence and better understand the structure of related risk factors, we construct Bayesian networks from patient features. Bayesian network nodes (features) and edges (conditional dependencies) are simplified with statistical network techniques. Upon reviewing a network visualization of our model, our clinician collaborators were able to identify strong relationships between risk factors widely recognized as associated with pressure ulcers. METHODS: We present a three-stage framework for predictive analysis of patient clinical data: 1) Developing electronic health record feature extraction functions with assistance of clinicians, 2) simplifying features, and 3) building Bayesian network predictive models. We evaluate all combinations of Bayesian network models from different search algorithms, scoring functions, prior structure initializations, and sets of features. RESULTS: From the EHRs of 7,717 ICU patients, we construct Bayesian network predictive models from 86 medication, diagnosis, and Braden scale features. Our model not only identifies known and suspected high PU risk factors, but also substantially increases sensitivity of the prediction - nearly three times higher comparing to logistical regression models - without sacrificing the overall accuracy. We visualize a representative model with which our clinician collaborators identify strong relationships between risk factors widely recognized as associated with pressure ulcers. CONCLUSIONS: Given the strong adverse effect of pressure ulcers on patients and the high cost for treating pressure ulcers, our Bayesian network based model provides a novel framework for significantly improving the sensitivity of the prediction model. Thus, when the model is deployed in a clinical setting, the caregivers can suitably respond to conditions likely associated with pressure ulcer incidence.

Subject(s)

Bayes Theorem , Electronic Health Records/statistics & numerical data , Intensive Care Units/statistics & numerical data , Models, Statistical , Pressure Ulcer , Adolescent , Adult , Aged , Aged, 80 and over , Female , Humans , Male , Middle Aged , Pressure Ulcer/diagnosis , Pressure Ulcer/epidemiology , Pressure Ulcer/therapy , Risk Factors , Young Adult

19.

Analysis of live cell images: Methods, tools and opportunities.

Nketia, Thomas A; Sailem, Heba; Rohde, Gustavo; Machiraju, Raghu; Rittscher, Jens.

Methods ; 115: 65-79, 2017 02 15.

Article in English | MEDLINE | ID: mdl-28242295

ABSTRACT

Advances in optical microscopy, biosensors and cell culturing technologies have transformed live cell imaging. Thanks to these advances live cell imaging plays an increasingly important role in basic biology research as well as at all stages of drug development. Image analysis methods are needed to extract quantitative information from these vast and complex data sets. The aim of this review is to provide an overview of available image analysis methods for live cell imaging, in particular required preprocessing image segmentation, cell tracking and data visualisation methods. The potential opportunities recent advances in machine learning, especially deep learning, and computer vision provide are being discussed. This review includes overview of the different available software packages and toolkits.

Subject(s)

Image Processing, Computer-Assisted/methods , Machine Learning , Microscopy/methods , Molecular Imaging/methods , Software , Animals , Biosensing Techniques/instrumentation , Biosensing Techniques/methods , Cell Culture Techniques , Cell Tracking/instrumentation , Cell Tracking/methods , Eukaryotic Cells/metabolism , Eukaryotic Cells/ultrastructure , Humans , Image Processing, Computer-Assisted/statistics & numerical data , Microscopy/instrumentation , Molecular Imaging/instrumentation , Signal-To-Noise Ratio

20.

annoPeak: a web application to annotate and visualize peaks from ChIP-seq/ChIP-exo-seq.

Tang, Xing; Srivastava, Arunima; Liu, Huayang; Machiraju, Raghu; Huang, Kun; Leone, Gustavo.

Bioinformatics ; 33(10): 1570-1571, 2017 May 15.

Article in English | MEDLINE | ID: mdl-28169395

ABSTRACT

SUMMARY: We developed annoPeak, a web application to annotate, visualize and compare predicted protein-binding regions derived from ChIP-seq/ChIP-exo-seq experiments using human and mouse cells. Users can upload peak regions from multiple experiments onto the annoPeak server to annotate them with biological context, identify associated target genes and categorize binding sites with respect to gene structure. Users can also compare multiple binding profiles intuitively with the help of visualization tools and tables provided by annoPeak. In general, annoPeak will help users identify patterns of genome wide transcription factor binding profiles, assess binding profiles in different biological contexts and generate new hypotheses. AVAILABILITY AND IMPLEMENTATION: The web service is freely accessible through URL: http://ccc-annopeak.osumc.edu/annoPeak . Source code is available at https://github.com/XingTang2014/annoPeak . CONTACT: gustavo.leone@osumc.edu or kun.huang@osumc.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Chromatin Immunoprecipitation/methods , DNA/metabolism , High-Throughput Nucleotide Sequencing/methods , Software , Transcription Factors/metabolism , Animals , Binding Sites , Humans , Mice , Promoter Regions, Genetic , Protein Binding , Sequence Analysis, DNA/methods

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL